A safe MkdirAll implementation #13

cyphar · 2024-06-26T10:00:20Z

A fairly common usage of SecureJoin is of the form:

path, _ := securejoin.SecureJoin(root, nonExistentPath)
_ = os.MkdirAll(path, 0o755)

Unfortunately, such usage is not safe if the root can be modified
between SecureJoin and MkdirAll. This is a known issue with SecureJoin
(using path strings for the API means we cannot be sure the path is
safe).

However, by carefully using file handles it is possible to adapt
SecureJoin to be safe to use in a race-free way with MkdirAll by merging
them into a single function.

We have to update the minimum Go version to 1.20 so that we can have
errors that contain more than one wrapped error.

The later patches include hardening using openat2 and fsopen.

Signed-off-by: Aleksa Sarai cyphar@cyphar.com

Things left to do:

alban

Thanks for working on this.

I'm adding a couple of comments.

mkdir_linux.go

procfs_linux.go

rata

Left some comments, I focused on the idea first and suggested alternatives for the !openat2 case.

I haven't checked the rest in much detail, let's see if those ideas make sense to you first.

mkdir_linux.go

rata · 2024-06-26T15:42:23Z

mkdir_linux.go

+// can modify the filesystem tree between SecureJoin and MkdirAll, it is
+// possible for MkdirAll to resolve unsafe symlink components and create
+// directories outside of the root.
+func MkdirAll(root, unsafePath string, mode os.FileMode) error {


What do you think of doing something simpler, and maybe slower, for the non openat2 case?

Openat2 is supported since 5.6, so lot of distros already have it.

Something like:

createdPath := "" for path in unsafePath { // unsafePath here can be something like strings.Split(unsafePath, string(filepath.Separator)) //NOTE: not exactly as utils.WithProcFd, as that passes a string, we want the fd. So this modified verson should use fh.Fd() instead. utils.WithProcFd(root, createdPath), func (basePath int) { // TODO: handle the case path exists unix.MkdirAt(basePath, path) } createdPath = createdPath + path }

In my mind, it seems semantically simpler (and simpler to see it is correct) if we handle the non-openat2() case using securejoin.Join.

I guess such a closure in go should work (or can be adapted to take path as param too).

This might be slower due to opening the proc fd, and maybe you can do symlink attacks that live inside the rootfs (I don't think it is a problem, it is as if that change was there from the beginning).

Too bad that SecureJoin doesn't return the fd. Although that might be another option to explore: a SecureJoin() variant that returns the fd too. In essence it would be like this utils.WithProcFd() we use in runc, and we just use securejoinFd() and mkdirat() in a loop.

What do you think?

Aside from being considerably less efficient (doing MkdirAll this way would result in O(n^2) lookups at least because we have to look up each component each time we add a component), SecureJoin is not a safe API to use even if you verify through /proc/self/fd for the reasons I outlined in #12.

Also, the current implementation has the same logic after we get the path we want from either partialOpen or partialOpenat2. You would need to have two entirely different implementations if you do it that way.

Too bad that SecureJoin doesn't return the fd. Although that might be another option to explore: a SecureJoin() variant that returns the fd too.

That is exactly what partialOpen does, it's just that rather than erroring out if the path doesn't exist it returns the last path it got to and the remaining components. If you don't return that, you get O(n^2) lookups because you need that information for MkdirAll.

partialOpen is basically a better SecureJoin.

I suspect you won't like the new symlink stack I needed to add to partialOpen to make the behaviour match openat2...

Heh. The more the complexity grows, the more I think O(n^2) for old kernels (4+ years) might not be so bad :-D

I wish there was an mkdirat2() that does the mkdir and returns an fd to the created entry on success (O_PATH, etc.). That would really simplify things and remove races here.

And this big c&p in partialOpen feels a little weird, to be honest. That can be a better secureJoin() (without an fd it is not a great interface IMHO), but it feels like mkdir should re-use that machinery instead of c&p.

Clearly secureJoin can be implemented by using partialOpen() and in the future we should probably do that. This might be ok for now.

Heh. The more the complexity grows, the more I think O(n^2) for old kernels (4+ years) might not be so bad :-D

After adding the symlink stack, I don't think you could reasonably emulate the symlink behaviour with SecureJoin.

I wish there was an mkdirat2() that does the mkdir and returns an fd to the created entry on success (O_PATH, etc.). That would really simplify things and remove races here.

I suggested it, Al rejected it.

Now that you mention it, we should probably check the owner and permission bits (and whether the directory is empty, I guess) though. If any of those properties are wrong, then we're in a directory the attacker swapped. If the owner and permission bits are the same (and the directory is empty), then even if the attacker did swap the directory, there is no difference in what they can do (if they have the same permissions as us, they can mess with our directory after we created it anyway). Done!

And this big c&p in partialOpen feels a little weird, to be honest. That can be a better secureJoin() (without an fd it is not a great interface IMHO), but it feels like mkdir should re-use that machinery instead of c&p. Clearly secureJoin can be implemented by using partialOpen() and in the future we should probably do that. This might be ok for now.

In principle, you could delete most of SecureJoin and just use partialOpen. For paths that exist, the semantics would be identical. I would love to remove as much as of the duplication between SecureJoin and partialOpen as possible, don't get me wrong!

However, I didn't do it in this PR because the semantics are noticeably different when dealing with dangling symlinks and non-existent paths, and I am not entirely sure you can get all of the semantics to be completely identical. SecureJoin's behavior for dangling symlinks in particular is something that we can't emulate with openat2 AFAICS.

I would argue one of the main API problems with SecureJoin is precisely the fact that you can use it with non-existent paths. There really isn't a reasonable result to give when you ask for a non-existent path.

EDIT: One option might be to add a flag to partialOpen which will cause it treat the remainingPath as a lexical path and try to continue resolving it after doing a path.Clean. This would let us emulate SecureJoin(root, "a/b/nonexistent/../foo") (we hit nonexistent/../foo as the remainingPath and this gets cleaned to foo and we continue to try to do the lookup). However, dangling symlinks would still need to result in an error, which would be an API change. I'll make a separate PR with a patch showing how that would work.

(I was thinking of looking into this for a possible v2 of securejoin, depending on whether it's actually necessary. Tbh, I consider SecureJoin to be somewhat legacy code with a borked API and weird quirks. Unless it's easy to guarantee the behaviour is identical, I'm not sure a refactor is a good idea...)

(I'll answer to the rest in more detail later, thanks for proving so many details!)

However, I didn't do it in this PR because the semantics are noticeably different when dealing with dangling symlinks and non-existent paths, and I am not entirely sure you can get all of the semantics to be completely identical.

You mentioned the dangling symlink behaviour changes several times, also in commits IIRC, but never what is the difference. Can you please explain what is exactly the difference?

That is very useful to effectively review the PR (to see the behavior matches, to see if tests are covering all the interesting cases, etc.).

Let's start here on github, then we can see if we want that info just on the commit msg or also in comments on the code. But I doubt we don't want to preserve those details very explicitely (just "there are differences" is not as useful as listing the differences per-se :)).

So, in a case like:

% mkdir -p root/a/b % ln -s a/b/doesnt-exist/c root/link

where link points to a path that doesn't exist, it is a dangling symlink.

SecureJoin("root", "link/foo") will return a/b/doesnt-exist/c/foo. This is because all symlinks encountered are expanded and treated as though they were part of the original unsafePath. (In the context of SecureJoin, this behaviour makes sense -- because it doesn't mind returning non-existent paths, and so a dangling symlink is treated the same as any other broken path.) So in a sense, symlink resolutions are "inatomic".

The old behaviour of partialOpen (based on SecureJoin) had similar semantics, and would result in a/b being opened and then doesnt-exist/c being created.

This may seem like totally reasonable behaviour, however it is not possible to easily emulate this with openat2 because openat2 has to resolve the path completely, and so a dangling symlink is never resolvable (basically, walking into a symlink is an atomic operation -- either the symlink can be resolved fully or it's dangling and you get -ENOTDIR/-ENOENT). If we wanted to emulate the SecureJoin behaviour with openat2 we would need to resolve symlinks manually and re-do openat2 for each component of a symlink (which could lead to a DoS if you had very long paths for each symlink).

Instead, the new partialOpen behaviour is designed to match openat2 -- if we are in the middle of resolving a symlink and hit a non-existent component of the symlink, we pretend as though opening the symlink itself gave us -ENOENT.

mkdirall: make partialOpen semantically identical to openat2 implementation is the implementation of this, and the following cases in testMkdirAllBasic check this behaviour:

All of the a-fake{1,2,3} tests.

The link3/deep_dangling{1,2}/foo tests (which check the case where you have deeper symlink resolution chains).

(The key thing is that the behaviour is identical when using partialOpen and when using openat2.)

I'll work on some partialLookupInRoot-specific tests for this behaviour next week.

cyphar · 2024-06-28T07:10:56Z

I might want to add some more tests later, but IMHO this is ready for review now that all of the features are properly tested.

PTAL @alban @rata

rata

This is going to take a while to review. I'll just post this two superficial comments and I'll try to focus on the big scheme of things now.

openat2_linux.go

procfs_linux.go

rata

Added some more comments.

I'd really really like a simpler PR (~1400 lines added is a lot!)

util_linux_test.go

rata · 2024-06-28T16:33:41Z

mkdir_linux.go

+// can modify the filesystem tree between SecureJoin and MkdirAll, it is
+// possible for MkdirAll to resolve unsafe symlink components and create
+// directories outside of the root.
+func MkdirAll(root, unsafePath string, mode os.FileMode) error {


And this big c&p in partialOpen feels a little weird, to be honest. That can be a better secureJoin() (without an fd it is not a great interface IMHO), but it feels like mkdir should re-use that machinery instead of c&p.

Clearly secureJoin can be implemented by using partialOpen() and in the future we should probably do that. This might be ok for now.

mkdir_linux.go

A fairly common usage of SecureJoin is of the form: path, _ := securejoin.SecureJoin(root, nonExistentPath) _ = os.MkdirAll(path, 0o755) Unfortunately, such usage is not safe if the root can be modified between SecureJoin and MkdirAll. This is a known issue with SecureJoin (using path strings for the API means we cannot be sure the path is safe). However, by carefully using file handles it is possible to adapt SecureJoin to be safe to use in a race-free way with MkdirAll by merging them into a single function. The implementation is designed so that we can easily add openat2 support, which is done in a follow-up commit. MkdirAllHandle is a variant of MkdirAll which takes the root as an *os.File and returns an *os.File for the final part of the created directory tree. This is a preferable interface to use if you need a handle to the final directory. We have to update the minimum Go version to 1.20 so that we can have errors that contain more than one wrapped error. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

An attacker could swap out the directory we created between us creating it and opening it. However, the worst thing they could do is force us to make a directory inside a directory with different permissions or a different mode. If they swap our directory with a directory that has the same owner and permissions as the one we created, they didn't gain anything. So, verify that the directory looks like the one we would've created. Unfortunately, there is no way to atomically create a directory and open it. I discussed this with Al a while ago and I believe he said he didn't like it (though I'm not sure if it was a blanket veto or not). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

Rather than saving a handle to /proc/self/fd, save a /proc handle and use /proc/thread-self/fd. This is necessary to handle Go programs with locked threads that have unshare(CLONE_FS), such as runc. This is modelled on similar logic that exists in runc (ProcThreadSelf), and is necessary if we want to use MkdirAll in runc. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

While adding openat2 support to SecureJoin doesn't make much sense (the API is wrong for handling race-safe paths), MkdirAll can use openat2 internally to make sure we safely operate on /proc and more efficiently do the partial-open necessary for MkdirAll. The safe /proc handling is based on similar logic in libpathrs. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

…tation openat2(2) will fail when we try to follow a broken symlink, which means that partialOpenat2() will treat the final symlink as a non-existent path. In contrast, partialOpen() will happily return a handle and remainingPath in the middle of the symlink resolution. While either of the above semantics are acceptable, we need to unify the behaviour so that users don't get different behaviour based on their kernel version. We can't change the behaviour of openat2(2), and emulating the current partialOpen() behaviour using openat2(2) is just asking for trouble. So, make partialOpen() match the behaviour of partialOpenat2(). Unfortunately, implementing this properly (including handling recursive symlinks) requires emulating a symlink stack and making sure that we return the top symlink we are in the mille of resolving if we hit a non-existent or otherwise invalid path. If there is no last symlink, we return the partial open as normal. Yeah, this is a little ugly, but I don't see a nicer way of implementing these semantics... Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

These are primarily correctness tests (81.3% code coverage), to make sure we handle all of the ugly corner-cases with nested and dangling symlinks correctly. Almost all of the untested code is related to errors (most of which are annoying to mock). The one set of error paths it might be worth adding errors for is the isDeadInode() checks. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

Because we depend on procfs to be correct when operating on other filesystems, having a safe procfs handle is vital. Unfortunately, if we are an administrative program running inside a container that can modify its mount configuration, our current checks are not sufficient to avoid being tricked into thinking a path is real. Luckily, with fsopen() and open_tree() it is possible to create a completely private procfs instance that an attacker cannot modify. Note that while they are both safe, they are safe for different reasons: 1. fsopen() is safe because the created mount is completely separate to any other procfs mount, so any changes to mounts on the host are irrelevant. fsopen() can fail if we trip the mnt_too_revealing() check, so we may have to fall back to open_tree() in some cases. 2. open_tree() creates a clone of a snapshot of the mount tree (or just the top mount if can avoid using AT_RECURSIVE, but mnt_too_revealing() may force us to use AT_RECURSIVE). While the tree we clone might have been messed with by an attacker, after cloning there is no way for the attacker to affect our clone (even mount propagation won't propagate into a clone[1]). The only risk is whether there are over-mounts with AT_RECURSIVE. Because anonymous mounts don't show up in mountinfo, the best we can do is check the mount id through statx to see whether a target has an overmount. We only do this for symlink operations (in particular, readlink) because openat2 handles the non-magiclink case for us and statx gained STATX_MNT_ID support in Linux 5.8. We could do this check only for the open_tree(AT_RECURSIVE) case, but it's simpler to always do it (of course, an attacker could add the mount after the check if we use the host /proc the "standard" way, but at least there's a chance we'll detect it). listmounts(2) would let us detect any overmounts at creation time, but it's not clear whether you want to error out if there is a mount over a path you never use (lxcfs has overmounts of /proc/cpuinfo and /proc/meminfo, which we never use). Unfortunately, we can only use these when running as a privileged user. In theory we could create a user namespace to gain the necessary privileges to create these handles, but this would require spawning a proper subprocess (CLONE_NEWUSER must be done in a single-threaded program, which is not possible in Go without using Go's ForkExec) which won't always work when filepath-securejoin is used as a library. It's also far too complicated to deal with in practice. This is based on similar logic I'm working on for libpathrs. [1]: This is true since at least Linux 5.12. See commit ee2e3f50629f ("mount: fix mounting of detached mounts onto targets that reside on shared mounts"). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

This ensures we test all of the fallbacks even if we don't use them in practice when doing MkdirAll in our test suite. In addition, add some hooks to allow the tests to force the usage of fallbacks when using the standard getProcRoot() function everything else uses. This raises the test coverage to 87.1%. In terms of correctness tests, there are a couple of extra tests it's probably worth adding (see procfs_linux_test.go for a list). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

This lets us make sure we are correctly handling all sorts of lookups that aren't obvious just by looking at MkdirAll. Of particular note are the tail-chained and other deep symlink resolution stacks (MkdirAll will just return an error, but the remainingPath we get is quite important). In addition, add some explicit tests for the error paths in symlinkStack. We can't force these with actual lookups (since they indicate a programming bug), but at least make sure we're testing them properly. This raises the test coverage to 88.6% now that we are testing more of the error paths in partialLookupInRoot. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

By combining the coverage from all platforms as well as privileged and non-privileged Linux runs, the coverage is now 87.7% for the entire project. It was necessary to split the Windows CI scripts because mktemp doesn't work in Powershell and it's necessary to reimplement the whole thing to get what you need (not to mention that -test.gocoverdir needs to be quoted in a particular way to not make Powershell angry). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

In order to get some confidence in saying that our code safe against these kinds of racing attacks, add some tests to cover this. The full suite (with 50k runs of each new subtest) takes about 10 minutes, so we should use "go test -short" for the full matrix and add a single separate test run which can run for longer (we don't need to merge the coverage from the stress test because we hit the same codepaths as in the short test). While working on this, I found what seems to be a kernel bug with readlink(/proc/self/fd/$n). There is a workaround in the tests for this, but it is possible that the test will become flaky. This brings the test coverage up to 91.1% on Linux (91.3% combined). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

cyphar · 2024-07-09T13:37:12Z

We now have 91.3% test coverage, including some pretty intense stress tests (so much so that I think I may have found a kernel bug in d_path.) I'm going to go ahead with merging this soon.

cyphar marked this pull request as draft June 26, 2024 10:00

This comment has been minimized.

Sign in to view

cyphar force-pushed the safe-mkdirall branch from 452b44b to 16fb358 Compare June 26, 2024 10:04

cyphar force-pushed the main branch from 94c9f5b to a6f3d45 Compare June 26, 2024 10:07

cyphar force-pushed the safe-mkdirall branch from 16fb358 to 4eb6df8 Compare June 26, 2024 10:07

cyphar mentioned this pull request Jun 26, 2024

Provide examples how to do os.OpenFile securely after SecureJoin() #12

Closed

cyphar force-pushed the main branch from a6f3d45 to b04d326 Compare June 26, 2024 10:16

cyphar force-pushed the safe-mkdirall branch from 4eb6df8 to 2f76de6 Compare June 26, 2024 10:17

alban reviewed Jun 26, 2024

View reviewed changes

mkdir_linux.go Outdated Show resolved Hide resolved

procfs_linux.go Show resolved Hide resolved

rata reviewed Jun 26, 2024

View reviewed changes

cyphar force-pushed the safe-mkdirall branch 11 times, most recently from 8191708 to a258c4f Compare June 28, 2024 07:00

cyphar marked this pull request as ready for review June 28, 2024 07:10

cyphar requested review from alban and rata June 28, 2024 07:11

rata reviewed Jun 28, 2024

View reviewed changes

openat2_linux.go Outdated Show resolved Hide resolved

procfs_linux.go Outdated Show resolved Hide resolved

cyphar force-pushed the safe-mkdirall branch from a258c4f to 56950fa Compare June 28, 2024 16:37

rata reviewed Jun 28, 2024

View reviewed changes

cyphar force-pushed the safe-mkdirall branch 2 times, most recently from 59c0417 to 157a7cd Compare June 29, 2024 06:53

cyphar force-pushed the safe-mkdirall branch 3 times, most recently from 5d9f872 to 78ce0fc Compare July 5, 2024 03:17

This was referenced Jul 5, 2024

mkdir_all() implementation openSUSE/libpathrs#10

Closed

safe procfs API openSUSE/libpathrs#15

Closed

/proc hardening openSUSE/libpathrs#7

Closed

cyphar force-pushed the safe-mkdirall branch 5 times, most recently from 0282597 to 35d1c38 Compare July 9, 2024 04:48

cyphar added 10 commits July 9, 2024 19:11

cyphar force-pushed the safe-mkdirall branch 3 times, most recently from 1be7dc5 to afbbbca Compare July 9, 2024 13:01

cyphar force-pushed the safe-mkdirall branch from afbbbca to ac32743 Compare July 9, 2024 13:36

cyphar merged commit 2192374 into main Jul 9, 2024
12 checks passed

cyphar deleted the safe-mkdirall branch July 9, 2024 14:19

cyphar mentioned this pull request Aug 7, 2024

proposal: os: safer file open functions golang/go#67002

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A safe MkdirAll implementation #13

A safe MkdirAll implementation #13

cyphar commented Jun 26, 2024 •

edited

Loading

This comment has been minimized.

alban left a comment

rata left a comment

rata Jun 26, 2024

cyphar Jun 26, 2024

cyphar Jun 27, 2024

rata Jun 28, 2024

rata Jun 28, 2024

cyphar Jun 28, 2024 •

edited

Loading

rata Jun 29, 2024 •

edited

Loading

cyphar Jun 29, 2024 •

edited

Loading

cyphar commented Jun 28, 2024

rata left a comment

rata left a comment

rata Jun 28, 2024

cyphar commented Jul 9, 2024

A safe MkdirAll implementation #13

A safe MkdirAll implementation #13

Conversation

cyphar commented Jun 26, 2024 • edited Loading

This comment has been minimized.

alban left a comment

Choose a reason for hiding this comment

rata left a comment

Choose a reason for hiding this comment

rata Jun 26, 2024

Choose a reason for hiding this comment

cyphar Jun 26, 2024

Choose a reason for hiding this comment

cyphar Jun 27, 2024

Choose a reason for hiding this comment

rata Jun 28, 2024

Choose a reason for hiding this comment

rata Jun 28, 2024

Choose a reason for hiding this comment

cyphar Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

rata Jun 29, 2024 • edited Loading

Choose a reason for hiding this comment

cyphar Jun 29, 2024 • edited Loading

Choose a reason for hiding this comment

cyphar commented Jun 28, 2024

rata left a comment

Choose a reason for hiding this comment

rata left a comment

Choose a reason for hiding this comment

rata Jun 28, 2024

Choose a reason for hiding this comment

cyphar commented Jul 9, 2024

cyphar commented Jun 26, 2024 •

edited

Loading

cyphar Jun 28, 2024 •

edited

Loading

rata Jun 29, 2024 •

edited

Loading

cyphar Jun 29, 2024 •

edited

Loading